Skip to content

Conversation

@athornton
Copy link

Except for cadc-postgresql-dev, which would require an aarch64 pgsphere RPM (and ideally a fedora 42 PG 15 aarch RPM).

I also slightly tweaked the Dockerfile for cadc-postgresql-dev, because an install script for the repo expected /var/lib/pgsql to exist already. This is probably better fixed with a mkdir -p inside that scriptlet than what I did.

Context: I work for Rubin Observatory, and we are investigating moving the Rubin Science Platform entirely to the arm64 architecture. I'm ensuring I have containers for everything I need to do that.

Also note that this pushes to ghcr.io under docker-base- and uses seconds-since-the-epoch for its tag; clearly CADC does not prepare its official releases via GitHub Action, but I don't know how any of that works, so...well, you could pull the image from ghcr.io, retag, and push, or you could rework the GHA to add a release tag and push to the correct destination, if you put a credential for that destination in your GitHub secrets.

That might be desirable, in that the matrix build strategy I'm using parallelizes a lot of stuff. You could also just do docker buildx build with --platform but that will run your build stages for each architecture sequentially and all but one of your architectures (presumably, all but amd64) will be emulated under QEMU, and therefore will be a lot slower. If you're only doing a handful of releases a year, you might not care about build speed.

Note also that this is pushing architecture-specific images to the repository and doing the fixup for a multiplatform manifest in a separate job. This is not as bad as it seems, because for a multiplatform image, you're going to need all the layers from each image anyway. Thus the actual overhead is only whatever the metadata for two more image names (...-amd64 and -arm64) is plus the unifying manifest (a couple of KB).

We're doing it that way at Rubin Observatory so we can take advantage of building arm64 natively and in parallel with amd64, and some of our images are too large to fit in the GitHub cache, so we need to push the images so we can do a HEAD to get a layer manifest out of them.

@pdowler
Copy link
Member

pdowler commented Oct 23, 2025

First, thnx for the effort and explanation. I do support the idea of multiplatform builds so deployers have more options. It will take me a little time to grok the changes.

I should note that this workflow is setup as a simple build test (for pull requests) so that I don't have to review things that don't build :-) In repositories with code such PR workflows also run unit tests.

RP workflows do not try to deploy and run integration tests... that is currently left to the developer and the release process. We have ambitions to automate more of that, but it is unclear exactly how to approach it.

A quick glance tells me this pushes images to ghcr.io; someone could pull and use that (in this case, the main ones here are base images so pull and use means building and testing something else). So my initial questions for right now:

  • are you looking for a "production ready" image that could be used?
  • should this be in a workflow that triggers when the PR is merged into main instead?
  • or do we really want to push every time a PR is updated because that image is useful for something?
  • do we (opencadc) need our own credentials to push the image(s) to ghcr? (haven't grok'ed the details of the setup/env)

As I saw in a comment: yeah, we publish production images to our own (harbor) service, but I'm not fully committed to that. How is ghcr working out for you? I understood there are rate limits for pulling (at least in the free tier)... are the limits high enough that it doesn't matter or do you pay to get past that?

@pdowler
Copy link
Member

pdowler commented Oct 24, 2025

I think I buried my questions too deeply above :-)

As I saw in a comment: yeah, we publish production images to our own (harbor) service, but I'm not fully committed to that.

How is ghcr working out for you?

I understood there are rate limits for pulling (at least in the free tier)... are the limits high enough that it doesn't matter or do you pay to get past that?

And does Rubin care about signed images? Do you sign your own or verify trust on the ones we publish via images.opencadc.org? We do sign (notary v1) and are debating notary v2 aka notation vs cosign...

@athornton
Copy link
Author

GHCR is working well for us. We have hit the rate limit, but only during scale testing (I think we hit the ghcr.io limit at about 500 simultaneously-launched users all asking for the same image). The great majority of the Rubin Data Management stuff lives there now (we have a bad feeling about Docker-the-company and don't want to rely heavily on Docker Hub); my group, since we run some Rubin Science Platform instances on Google Cloud also pushes to Google Artifact Registry but in general we use ghcr.io as our registry of record.

We are building our own images on top of these. Stelios Voutsinas can give you many more details.

We haven't given a lot of thought to image signing or attestation.

I think pushing on merge to main, or better yet, on release-associated-with-new-tag, would be best. What most of our internal services actually do is more like:

    # Only do Docker builds of tagged releases and pull requests from ticket
    # branches.  This will still trigger on pull requests from untrusted
    # repositories whose branch names match our tickets/* branch convention,
    # but in this case the build will fail with an error since the secret
    # won't be set.
    if: >
      (github.event_name == 'release' && github.event.action == 'published')
      || (github.event_name != 'merge_group'
          && (startsWith(github.head_ref, 'tickets/')
              || startsWith(github.head_ref, 't/')))

So: new GitHub release, or a push to a PR from a particularly-formatted branch name.

But that last thing is so that we can run tests from images built from branch builds, which may or may not be too much uploading and too much churn for you.

One thing that strikes me about pgsphere, which is what's keeping us from immediately doing cadc-postgresql-dev too: how committed to Fedora are you? I ask because Debian has pgsphere-postgresql15 (I might be wrong about the exact package name) already--at any rate, you don't have to build the pgsphere extension by hand in Debian-world because the package already exists in the base repo.

@pdowler
Copy link
Member

pdowler commented Oct 27, 2025

Thanks for the info about ghcr. We are thinking about if/how to use it and will be discussing in the next few days.

The cadc-postgresql-dev image is really just for developer convenience, so the fedora usage is just because it is easiest for me (that's what I use) to build pgsphere rpms there because fedora includes all the obscure dependencies and tools and I know how to use that system... that's less true of other redhat-adjacent distros). And it's easier for me to install server packages and provide a simple db-init. I could probably figure out how to cross-compile pgsphere for arm64, but no one has asked for that before. Especially since it is for devs only that's the only audience I'd want to hear from (I don't want to support production db servers :-)

The fedora usage in production (cadc-java and cadc-tomcat base images) is because it is the easiest and most robust way for me to install the required software (including erfa and wcslib used by a number of our downstream components and not in many distro repos)...

In general, I know and trust the upstream (fedora) so I am quite "committed" personally... but if someone else was going to maintain a specific image indefinitely I would let them decide, but right now it's me doing with minimal time it so I do what's easiest and reliable.

@athornton
Copy link
Author

That's fair. I think Stelios said we didn't use the postgresql-dev image ourselves, so I'm reasonably confident that (if we rebuild against the ghcr.io ARM images I pushed) I can meet our short term goals of getting an entire RSP instance on arm64.

In the longer term multiplatform builds of everything feel like a good idea to me, but (as outlined above) doing this the more standard way of just passing multiple platforms to docker buildx build might make more sense for you. Our use case, where we are building very large (and time-consuming to build) images on a daily cadence, means that Rubin has sunk the effort into parallelizing the builds, but if you're doing a few, not huge, images just every once in a while, docker buildx build with multiple arguments to --platform is going to be the more straightforward way to get there.

@pdowler
Copy link
Member

pdowler commented Oct 28, 2025

Using your idea above to use buildx I have been able to add multi-arch build capabilities to our internal release process, so I am confident we will be able to publish amd64 + arm64 images. A coworker with a recent mac was able to run the arm64 version natively.

We are also discussing automating publish to the github CI, but these images are pretty small and not built that frequently so we can probably do this the simple way. I was also inspired by your ghcr info to consider building and pushing to ghcr (eg. for testing) and use images.opencadc.org for fully tested and supported images (and where we deploy from)... if we do that we might want to consider the matrix build.

So, maybe for now we just want the CI to build multi-arch with

docker buildx build --platform linux/amd64,linux/arm64 ...

just to verify it works, but not push for now. We'll figure out what else we want to automate separately.

Until that happens, I will update our internal image publishing to do multi-arch and should be able to have fully tested images very soon.

@athornton
Copy link
Author

Thank you! While having arm64 images is important to me, how they happen really isn't. I think --platform is a perfectly reasonable choice for your use case.

I've gotten very fond of using Git-tagged releases (via GitHub) to produce releases because it's much less error-prone (when set up correctly, which can be an adventure) than doing it manually. Note that the docker login action is perfectly capable of logging in to arbitrary registries via its registry-auth input (although it's pretty clunky) so you could drive your pushes to images.opencadc.org from GitHub Actions if you wanted to.

@pdowler
Copy link
Member

pdowler commented Oct 28, 2025

Making a more minimal PR for multi-arch and updated version tags. When I get back from ADASS+IVOA I will look into using this as the main build and automate pushing to ghcr and/or images.opencadc.org... also in the midst of changing image signing from notary v1/DCT... please keep this PR open and convert to draft; I will want to look at it more closely in a few weeks.

Will publish updated base images later today.

I will poach your postgresql fix - looks like the package on postgresql.org is missing a mkdir...

@athornton athornton marked this pull request as draft October 30, 2025 03:02
@athornton
Copy link
Author

Converted to draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants